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Abstract 

Ve  present  a  general  theory  of  consistent  estimation  for  possibly 
misspecified  parametric  models  based  on  recent  results  of  Domowitz  and  White 
[17].   This  theory  extends  the  iinification  of  Burguete,  Gallant  and  Souza  [12]  by 
allowing  for  heterogeneous,  time  dependent  data  and  d3mamic  models.  The  theory 
is  applied  to  yield  consistency  results  for  quasi-maximum  likelihood  and  method 
of  moments  estimators.   Of  particular  interest  is  a  new  generalized  rank 
condition  for  identifiability. 


SEP  3  0  1985 


1 .   Introduction 

In  the  second  section  of  "Kisspecified  Models  with  Dependent  Observations," 
Domowitz  and  White  [l?]  provide  general  results  for  establishing  the  consistency 
and  asymptotic  normality  of  a  wide  class  of  estimators  obtained  as  the  solution 
to  an  abstract  optimization  problem.   Their  results  apply  to  misspecified  models 
since  the  model  which  motivates  the  optimization  problem  need  not  contain  the 
true  data  generating  process.   Because  of  the  somewhat  abstract  nature  of  the 
sufficient  conditions  for  these  results,  it  can  sometimes  be  difficult  to 
determine  whether  a  specific  estimation  technique  falls  within  this  class  and 
therefore  has  the  consequent  consistency  and  asymptotic  normality  properties  . 
Here  we  provide  a  more  convenient  theory  of  estimation  specific  enough  to  be 
directly  applicable  to  particular  estimation  techniques  applied  to  potentially 
misspecified  models  and  yet  sufficiently  general  so  as  to  cover  many  estimation 
techniques  of  interest  to  econometricians.   This  theory,  presented  in  Section  2, 
bridges  the  gap  between  the  general  results  of  Domowitz  and  White  [iVj  and  the 
myriad  of  single  case  results  which  characterize  much  of  the  econometrics 
literature.  It  also  extends  the  unification  achieved  by  Burguete,  Gallant  and 
Souza  [12]  for  the  case  in  which  the  true  data  generating  mechanism  is  not 
subject  to  drift  by  allowing  the  least  mean  distance  and  method  of  moments 
estimators,  which  they  treated  separately,  to  be  handled  in  a  single  framework; 
and  by  allowing  for  dynamic  models  and  fairly  arbitrary  data  processes  which  fall 
outside  their  theory.   Although  our  focus  here  is  on  the  consistency  of 
estimators,  the  approach  we  take  also  allows  a  unified  treatment  of  asymptotic 
normality  and  provides  a  very  convenient  foundation  for  addressing  the  issue  of 
asymptotic  efficiency  for  various  estimators  of  the  parameters  of  a  correctly 
specified  model.   (These  issues  are  addressed  in  Bates  and  White  [?])• 


Sections  5  bji^   4  present  examples  of  specific  estimation  techniques  covered 
by  the  results  of  Section  2,  which  represent  extensions  of  previously  published 
results.   Section  3   deals  with  the  consistency  of  the  (quasi- )maximum  likelihood 
estimator.   Section  4  examines  the  consistency  of  m-estimators  and  of  the 
generalized  method  of  moments  estimator  of  Hansen  [25]« 

2.   CONSISTENCY  OF  THE  ABSTRACT  ESTIMATOR 

The  available  data  are  assumed  to  be  generated  in  the  following  way: 

ASSUMPTION  A.I:   The  observed  data  are  a  realization  of  a  stochastic  process  o)  = 
{v.:  t,1,2,...}  on  a  probability  space  (Q,  F,  P*),  where  D  =  x    E  =  R  ,  and  F 
is  the  Borel  o-field  generated  by  the  measurable  finite  dimensional  product 
cylinders. 

The  probability  measure  P*  provides  a  complete  description  of  the  stochastic 
relationships  holding  among  the  data  both  contemporaneously  and  over  time. 
Usually,  the  "data  generating  mechanism"  described  by  P*  is  unknown,  and  our  goal 
is  to  use  the  available  data  to  learn  about  some  aspect  of  the  data  generating 
mechanism  which  is  of  particular  interest. 

A  common  way  of  doing  this  is  to  specify  a  probability  model,  i.e.  a  family 
of  probability  measures  P={Pg:6EG}.  ^ois  indexed  by  parameters  6 
taking  values  in  some  set  9.  For  example,  IP  might  be  the  family  of  probability 
measures  whose  finite  dimensional  distributions  are  given  by 

TqJE)   =  /^  f^(u)^,  e)  dix^iJ^)  E  eB  (eJ),  n=1^,2,... 


S  1 

where  f   :  K  x  0  -►  P.  is  the  likelihood  function  of  the  probability  model  for  a 

samnle  of  size  n,  |a  iea  a- finite  measure  on  fp.  ,  (B(F.  ))   and  u)  H  ( oj  ,  . . .  oj  ) 
■^  n  ^nn  in 

e  E^  .   The  notation  B(E  )  denotes  the  Borel  a-field  generated  by  the  open  sets 
n  1 

of  R  =  x.^  R  .  This  is  a  probability  model  which  is  completely  specified,  in 
the  sense  that  under  general  identification  conditions,  a  specific  choice  for  6 
leads  to  a  unique  probability  measure  P„  c  P. 

However,  interest  only  in  certain  features  of  the  data  generating  mechanism 
(e.g.  the  conditional  means)  may  lead  to  specifying  a  probability  model  P  such  as 
the  collection  of  all  probability  measures  P^  for  which 

Eg(V^  I  F^_^)  =  V^_^e         ,   t=1,2,... 

where  E„(  •  i  •  )  is  the  conditional  expectation  implied  by  the  probability 
measure  P„  and  F.  ,  =  a  (V. ,...,  V   )  is  the  Borel  o-field  generated  by  V  , ..., 

D         t— 1  1  t— 1  I 

¥.  ..  This  is  the  vector  autoregression  model  of  order  1  [VAR(1)).  Because  this 

model  only  specifies  the  behavior  of  conditional  expectations  under  P„,  it  does 
not  yield  a  "completely"  specified  probability  model  in  the  same  way  that 
specifying  a  likelihood  function  does  since  a  unique  probability  measure  is  not 
determined  for  given  8.  Nevertheless,  if  our  interest  lies  solely  in  conditional 
expectations,  then  such  a  specification  may  be  sufficiently  complete  for  our 
purposes,  and  identification  of  the  parameters  of  the  conditional  mean  can  be 
achieved  under  general  conditions. 

In  most  studies,  it  is  assumed  that  P*  belongs  to  P,  the  specified 
probability  model,  so  that  the  model  is  correctly  specified.   Due  to  the 
complexity  of  economic  phenomena,  there  is  little  guarantee  that  the  probability 


models  obtained  from  the  economic  theory  relevant  to  the  data  under  consideration 

are  correctly  specified.   Thus,  we  acknowledge  from  the  outset  that  any  specific 

probability  model  adopted  is  most  plausibly  treated  as  misspecified  (so  that  P* 

is  not  necessarily  in  P)  and  that  our  model  is  best  viewed  as  a  way  of  obtaining 

Bome  kind  of  approximation  to  P*.   For  this  reason,  any  theory  which  we  develop 

should  rely  as  little  as  possible  on  the  suspect  probability  model  P. 

Of  course,  there  is  always  the  hope  that  one's  probability  model  is 

correctly  specified.   Given  this  hope,  it  is  natural  to  construct  estimators 

based  on  the  probability  model  that  have  the  property  that  if  P*  is  in  P  (e.g. 

P*  =  P  0  for  some  6  in  0)  then  the  estimator  is  consistent  for  the  "true" 
y 

parameters,  6  .   Such  estimators  are  frequently  constructed  by  choosing  a 
parameter  vector  in  0  which  implies  the  closest  possible  correspondence  between 
the  behavior  exhibited  by  the  data  and  that  implied  for  the  data  by  the 
probability  model. 

The  criterion  by  which  one  measures  "closeness"  allows  considerable 
latitude.  For  probability  models  with  a  specified  likelihood  function,  closeness 
can  be  measured  using  the  Kullback-Leibler  [31 ]  Information  Criteria  (KLIC) , 
which  provides  a  measure  of  the  closeness  of  the  specified  likelihood  function 
f  (o)  ,  9)  to  the  joint  density  of  the  sample  dP*  /  dp  implied  by  the  true  data 

generating  mechanism.   (We  define  P*(E)  =  P*[  u)  eE],  EeE.)  This  leads  to 

the  method  of  maximum  likelihood,  in  which  an  estimator  is  obtained  by  solving 

the  problem 

max   In  f  ( 0)  ,  9 ) . 
e  e  0      ^ 

For  the  VAK(1)  model  discussed  above,  closeness  may  be  measured  in  terms  of 

how  well  V,  , 9  approximates  ¥,  .  An  estimator  could  be  obtained  by  solving  the 


problem 

n 
min    I      (W  -  V  .e)'(V.  -  V   6) 
e  E  0  t=1    ^    ^"^     ^    ^"^ 

which  gives  a  least  squares  estimator,  or  in  the  case  of  scalar  W,  by  solving 


n 

I  w  _  w 

t-1 


min    ^   I  V^  -  W 


e  E  e  t=i 

This  gives  the  least  absolute  deviations  estimator. 

The  study  of  the  consequences  of  applying  such  estimators  to  misspecified 
probability  models  was  initiated  by  Huber  [28]  and  Berk  [8,9]»   Work  by  White 
[42,44,45]  examines  the  implications  of  model  misspecification  for  ordinary  least 
squares,  nonlinear  least  squares  and  maximum  likelihood  estimation  respectively 
under  convenient  assiimptions  less  general  than  those  of  Huber,  while  the  recent 
work  of  Burguete,  Gallant  and  Souza  [12]  provides  a  unified  framework  in  which 
estimation  of  misspecified  models  can  be  conveniently  treated. 

All  of  these  studies  address  situations  in  which  the  observations  are 
independent,  and  aspects  of  dependence  and  dynamic  misspecification  are  not 
addressed.   However,  the  recent  work  of  Domowitz  and  White  [l?]  does  provide  a 
framework  in  which  the  consequences  of  misspecification  can  be  studied  in  the 
context  of  dependent  observations . 

The  starting  point  for  all  of  this  work  is  the  recognition  that  no  matter 

how  the  data  may  truly  be  generated, the  parametric  probability  models  typically 

specified  and  the  estimation  criteria  tj^jically  applied  often  lead  to  estimators 

obtained  as  the  solution  to  an  optimization  problem,  in  which  the  optimand  is  a 

function  of  the  observed  data,  u),  and  the  parameters  6.   Formally,  an  estimator 

G  is  the  solution  to  an  optimization  problem 

min  Q^i^t   6)' 
e  E  e  ^ 


The  properties  of  6  can  be  studied  in  the  misspecified  case  by  relying  upon  the 
specified  probability  model  only  to  the  extent  that  it  affects  the  form  of  the 
optimization  problem  as  determined  by  the  choice  of  estimation  technique,  and 
then,  placing  as  little  structure  on  I*  as  possible,  using  laws  of  large  numbers 

and  central  limit  theory  to  draw  conclusions  about  the  behavior  of  9  . 

n 

A 

To  study  the  consistency  of  6  we  rely  on  the  heuristic  insight  that  because 

A 

9  minimizes  Q  and  because  Q  can  generally  be  shown  to  tend  to  a  real  function 
n  n  n 

Q   :0->-Rasn-»-'=  under  mild  conditions  on  P*,  then  6  should  tend  to  0*  (say) 
n  n  n    "^ 

which  minimizes  Q  . 
n 

In  order  for  this  heuristic  argument  to  work,  we  require  two  things:  that  0 
converges  appropriately  to  some  function  Q  ;  and  that  Q  has  a  well  defined  (i.e. 
appropriately  unique)  minimum.  For  the  first  item,  the  appropriate  convergence 
is  that  Q  (u),9)  -  Q  (6)  ■*     0  uniformly  on  0.  For  the  second  item,  the 
appropriate  xmiqueness  condition  is  supplied  using  the  following  definition  of 
Domowitz  and  White  [17]. 


DEFHTITION  2.1.   Let  Q  (9)  be  a  real-valued  continuous  function  on  a  compact 
metric  space  0  such  that  Q  (9)  has  a  minimvim  at  0*,  n=1,2, ...  .   Let  S  (e) 

be  an  open  sphere  centered  at  0*  with  fired  radius  e  >  0.  Por  each 

n=1,2, ...»  define  the  neighborhood  t]  (e)  =  S  (  e)  H  0.   Its  complement  in 

0,  T)  (e),  is  compact.  The  minimizer  0*  is  said  to  be  identifiably  unique  if  and 
only  if  for  any  e  >  0 


lim  inf  [   mi^    Q^(9)  -  Q^iS^)]   >  0. 
n  -»■  »    e  E  T)  (e) 

'n 


Hote  that  this  is  a  global  concept  which  essentially  eajs   that  there  exists  a 

2    ~ 
supporting  hyperplane  of  the  epigraph  of  Q  which  is  uniformly  bounded  away  from 

the  epigraph  of  0  outside  every  neighborhood  of  a  unique  supporting  point  9*. 

As  such,  it  is  easily  seen  to  be  a  minimal  requirement  in  that  if  it  is  not 

satisfied,  a  sequence  {G  }  could  always  be  constructed  such  that  the  distance 

between  B*  and  6  does  not  approach  zero  even  though  |Q_(9*)  -  Q  (^  ) I  does, 
n      n  Ti  n     n  n 

For  convenience,  we  restate  the  consistency  result  of  Domowitz  and  White 
[17]. 


THEOREM  2.2.   Given  Assumption  A. 1 ,  assume: 

Assumption  2.2.i:  For  each  9  in  S,  a  compact  subset  of  El  ,  Q  (u),  9)  is  a 

measurable  function  on  Q  and,  for  each  w  e  Q,  a  continuous  function  on  Q, 

n=1,2,...   . 

Then  there  exists  a  measurable  function  9  (co)  such  that 

n 

Q  (o),  9  (u)))  =  inf   Q  (to,  9)   for  all  to  e  Q. 
^     ^       9  e  G   ^ 

If,  in  addition, 

Assimption  2.2.ii:  |Q  (io,9)  -  Q  (9)|    -»  "  0  as  n  -»■  o,  uniformly  on  ©;  and 

Assumption  2.2.iii:   Q  (G)  has  an  identifiably  unique  minimizer  9*,  n=1,2,...; 

'^         s,  s 
then  9  -9*   -^*Oasn->-=>. 
n    n 


The  form  and  proof  of  this  result  owe  much  to  the  antecedent  work  of  Vald  [39]  , 
Jennrich  [29],  Hoadley  [27]  and  Amemiya  [2]  for  estimation  of  correctly  specified 
models. 


1'^ 


The  requirement  that  Q  (9)  have  an  identifiably  iznique  minimi zer  allows 
"identification"  of  the  parameters  P*  only  in  the  sense  that  the  objective 
function  is  not  allowed  to  become  too  flat  in  the  neighborhood  of  its  minimum. 
This  is  quite  distinct  from  the  notion  of  identification  which  arises  in  the 
estimation  of  the  "true"  parameters  of  a  correctly  specified  model.   There,  the 
knowledge  of  P  makes  it  possible  to  discuss  meaningfully  the  identification  of 
true  parameters  without  reference  to  the  properties  of  the  optimand  defining  the 
estimator,  although  lack  of  identification  does  lead  to  optimands  with  non-unique 
minima.  When  the  model  is  correctly  specified,  it  is  tjrpically  straightforward 
to  place  convenient  and  plausible  primitive  conditions  on  the  model  to  obtain 
identification  of  true  parameters  (as  in  Wald  [39]»  Jennrich  [29],  Hoadley  [27]i 
and  Amemiya  [2])  and  then  verify  Assumption  2.2.iii  (or  an  appropriate  analog) 
under  appropriate  primitive  conditions  on  the  estimator  (as  these  authors  also 
do). 

In  the  present  context ,  0*  does  not  necessarily  correspond  to  any  "true" 
parameters,  but  instead  is  determined  by  the  interaction  of  the  probability  model 
specified,  the  estimation  technique  chosen,  and  the  behavior  of  the  underlying 
stochastic  process  described  by  the  unknown  probability  measure  P*.   In 
particular,  given  a  specific  probability  model  and  specific  data  generation 
mechanism,  the  sequence  G*  will  generally  differ  under  different  choices  of 
estimators,  e.g.  ordinary  least  squares  vs.  least  absolute  deviations,   (in  fact, 
such  variability  can  be  a  useful  indicator  of  model  specification  problems.) 

Thus,  it  is  useful  and  important  to  distinguish  between  the  concept  of  the 
identification  of  the  parameters  of  a  correctly  specified  model,  which  is  a 
property  of  the  probability  model  not  requiring  reference  to  any  particular 
estimation  technique,  and  the  concept  of  identifiability  of  the  parameters  of  a 
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possibly  niisspecified  model,  which  results  when  the  probability  model  and 
estimation  technique  chosen  interact  to  produce  a  uniquely  determined  parameter 
vector  or  sequence  of  parameter  vectors  to  which  the  estimator  tends. 

When  estimating  the  parameters  of  a  correctly  specified  model, 
identification  of  the  true  parameters  is  a  necessary  condition  for  their 
identifiability,  defined  in  this  way.   It  is  not  a  sufficient  condition  because 
the  estimator  may  be  constructed  in  such  a  way  as  not  to  be  cnsi stent  for  the 
true  parameters,  but  for  some  other  parameters  which  are  nevertheless 
"identifiable".   Moreover,  a  model  may  fail  to  be  identified,  but  estimation  of  a 
misspecified  version  of  the  model  may  yield  identifiable  parameters.   For 
example,  suppose  the  correctly  specified  model  is 

^t  =  h'  ^  H 

where  E(X.'e.)  i^   0,  and  no  instrumental  variables  are  available  for  X  (e.g. 

XX  X 

because  X,  =  Y.y  +  u,  also,  as  in  the  elementary  supply  and  demand  model  where 

Xj  is  price  and  1,  is  quantity).   In  this  model  6  is  not  identified. 

Nevertheless,  the  parameter 

0*  =  E(X^'X^)~^  E(X^'Y^) 

is  identifiable  when  the  method  of  ordinary  least  squares  is  applied  to  i.i.d. 
•observations  on  X  and  Y  in  an  attempt  to  estimate  6. 

In  the  present  context  in  which  the  probability  model  is  sxispect  (or 
equivalently,  in  which  the  estimation  technique  chosen  does  not  consistently 
estimate  the  true  parameters  of  an  otherwise  correctly  specified  probability 
model),  the  concept  of  identification  loses  its  immediacy,  and  the  concept  of 
identifiability  acts  as  an  appropriate  replacement.  Nevertheless,  if  the  model 
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is  correctly  specified  or  incorrectly  specified  in  some  irrelevant  way,  it  is 
usually  quite  easy  to  verify  Assumption  2.2.iii  using  appealing  primitive 
conditions  on  the  model  and  estimator,  as  discussed  by  White  [48,  chapter  5]  in  a 
context  closely  related  to  that  presented  here.  We  provide  a  result  in  Section  3 
"below  which  provides  such  primitive  conditions  for  method  of  moments  and 
m-estimators  not  treated  by  White  [48]. 

The  difficulty  in  applying  Theorem  2.2  is  the  need  to  verify  Assumptions 
2.2.i  through  2.2.iii.   Our  goal  here  is  to  simplify  this  task  by  presenting 
convenient  sufficient  conditions  for  Assumption  2.2.i  through  2.2.iii  which  are 
nevertheless  sufficiently  general  to  cover  many  estimators  of  interest  to 
economists.   The  estimators  considered  here  are  defined  as  follows: 


ASSUMPTION  A. 2:   The  estimator  6  is  the  solution  to  the  problem 

n 
min  Qjbi,   6)  =  g„(n~  I     (i.(u),  6),  Tt^(a)))  (2.1) 

e  e  e   ^        ^     t=1   ^        ^ 

wher  e 

(A.2.i)   {£„•  E  X  R-^  -»•  E  }  is  continuous  uniformly  in  n; 

(A.2.ii)  (1.  :  Q  X  9  -♦■  E  is  measurable  for  each  G  e  0,  a  compact  subset  of 

k 
E  ,  and  continuous  on  G  for  each  w  z  Q,   uniformly  in  t=1,2,...   ; 

(A.2.iii)  i:  :  Q  ">■  E  is  measurable,  n=1,2, ... 


With  this  condition,  it  is  straightforward  to  verify  Assumption  2.2.i. 
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COROLLARY  2-3  (Existence):   Given  Assumptions  A.1  and  A. 2,  Assumption  2.2.i  is 
satisfied,  and  there  exists  a  measurable  function  6  (u))  such  that 

Q  (co,  e  ((!)))  =Qinf-  Q  (o),  e)  for  all  co  in  Q. 

n     n      weo   n 

Proofs  of  all  results  are  given  in  the  Mathematical  Appendix. 

The  form  given  in  (2.1)  is  quite  flexible.   For  example,  letting 
g  (c|j,  It)  =  -(\>   and  letting  q,  be  the  (quasi-)log-likelihood  of  an  observation 
yields  the  (quasi- )maximum  likelihood  estimator.   Letting^  ti  =  vech  P  for 
symmetric  matrices  P  and  letting  g   {(l>,    ti)   =   4''P4'  yields  instrumental  variables, 
method  of  moments  or  m-estimators. 

It  is  tjrpically  convenient  to  view  q,  as  a  composition  of  functions, 
e.g. 

q^(u,  e)  =  s^(V^(a)),  e) 

where  ¥  (oj)  =  [¥.  (uj),...,  ¥  (o)))  =  w     and  s  :  E ,  x  9  •>  R  is  appropriately 

measurable  on  E  and  continuous  on  B.  This  allows  us  to  see  immediately  that  the 
least  mean  distance  estimators  of  Burguete,  Gallant  and  Souza  [12J  fall  into  the 
present  class.   In  their  notation,  a  least  mean  distance  estimator  minimizes  an 
optimand  of  the  form 

-1   ^ 
Q^(w,  e)  =  n"  I     s(x^,  y^,  T^((j),  e). 

For  many  of  these  estimators  (and  in  particular  the  feasible  GLS  estimators) 
s(x^,y^,T^(a)),e)  =  s^(z^,y^,e)  +  s^ItJi^)]   +  b^(t^(u))  )'s^(x^,y^-,e) 

where  s.  and  s„  are  scalar  functions  and  s_  and  s,  are  finite  dimensional 
12  5      4- 

vectors.   In  this  case 


1il 


4  ^  'a  <  ° 


n'        I     s(x    ,   y    ,    T    (co),    6]   -   a(T   (o)))'    n"        J]     b(x.,   y        e) 
^^^  X        t       n  n  ^^1  t        t 

where  a'    =   (1,   82,    s^)    and   b'   •=    (s^ ,    1,    b'^).      Setting  g^dl*.   "t)  "=   ^{ii)(\>, 

q.(u),   e)   •=  b(W^(a)),    G),    ¥^(a))    H   (x^,   y^)   «=   co^  and   11^  =   t   ,   we   see   that   this  is  in 

the   form   (2.1 )• 

A  similar  argument  applies  to  many  of  the  method  of  moments  estimators 
discussed  by  Burguete,  Gallant  and  Souza.   An  importeint  exception,  however,  is 
the  case  of  m-estimators  which  use  a  preliminary  estimate  of  scale.   These 
estimators  are  obtained  as  solutions  to  systems  of  equations 

-1   '^ 

n'  I  ^gf^^.  y^'  \^^'^'   e)  ■=  0 

A 

where  t  enters  in  a  nonlinear  manner  and  cannot  be  factored  out  in  the  way  shown 

n 

A 

above.  Nevertheless,  if  t  is  itself  an  m-estimator  solving  the  equations 

t=1   ^  ^   ^ 
then  the  scaled  m-estimator  can  be  obtained  as  the  solution  to  the  system  of 
equations 

n    I     m(x^,  y^,  -c,  9)  =  0 
t=1 

where  m(z^,  y^,  t,  6)'=  (m^(x^,  y^,  t)',  m^Cx^,  y^,  t,  6)').  Thus,  the  scaled  m- 

estimator  can  be  viewed  as  a  particular  m-estimator  not  involving  nuisance 
parameters,  and  it  therefore  falls  into  the  present  framework.  Similar 
techniques  can  be  applied  to  least  mean  distance  estimators  which  contain 
nuisance  parameters  x     entering  nonlinearly.  Thus,  the  estimators  considered 
by  Burguete,  Gallant  and  Souza  (and  in  particular  all  those  they  enumerate  in 
Section  4)  can  be  embedded  in  the  present  class. 


^^ 


Common  feasible  GLS  estimators  for  the  more  general  setting  allowed  here 
also  fall  into  our  class  of  estimators.   For  example,  consider  a  system  of  linear 
regression  equations 

in  which  the  S.  x   1  vector  of  errors  c,  is  thought  to  have  a  VAR(1)  structure 

S  '  ^S-1  "  ^t' 

where  ti,  is  assumed  i.i.d.,  independent  of  z      .    and  of  X  ,  t"1,2,...  and 

X  X~  1  T 

Por  this  model,  one  could  construct  a  feasible  GLS  estimator  by  obtaining 
consistent  estimates  of  E  and  Z,  say  R  and  I  ,  and  then  solving  the  problem 

min  (n-1)-^  j  (Y,  -  X^  6  -  E^(Y,_^  -  X,_^e))'  z/ 
6  E  0         t— 2 

X  (y, -x^e-E^(Y^-x^^e)). 

This  is  a  least  mean  distance  estimator  which  has  the  same  form  as  the  feasible 
GLS  estimators  discussed  by  Burguete,  Gallant  and  Souza,  where  t  corresponds  to 

A       A  A  /\ 

E  ,  Z  .   Convenient  choices  for  E  and  Z  can  be  constructed  using 
n   n  n     n 

«OLS  =  ('^'  ^^"'  ^'  ^ 
where  X  is  the  Jl»n  x  k  matrix 

X  =  [X. , • •  • ,  X^J 

and  Y  is  the  Jl-n  x  1  vector 

Y''=  [y;,....  y;1  .        ■ 

Letting 

A  ,     r  A  A  1 
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if'*       ■*   "1 

^_^'     [0,     E^ t^_^   J, 


Where  e^  =  Y^-  X^e^^^,  we  form 


n 
and 


Z     -  (e  -  £  ,E')'(e  -  e  ,E')  /  (n-l). 
n        -In       -1  n  ' 


Note  that  this  model  only  motivates  our  choice  of  estimator.   It  need  not  be 
correctly  specified  (e.g.,  e.  could  in  fact  be  VAR(2)).   Other  feasible  GLS 
estimators  (e.g.  those  correcting  for  heteroskedasticity  as  in  Chapter  VTI  of 
¥hite  [46])  can  be  similarly  treated. 

To  establish  Assiimption  2.2.ii  for  (2.1)  we  provide  conditions  which  ensure 

—  1  rn       /  \  <*■ 

the  convergence  of  the  arguments  of  g  ,  i.e.  n   K--!  <1j.('»>»  ^)   ^^^  ''^   »  ^^^   then 

verify  that  g  is  sufficiently  well  behaved  to  ensure  the  convergence  of  Q  . 
This  verification  is  accomplished  using  the  following  lemma',  a  generalization  of 
Proposition  2.16  of  White  [46]. 


Z         m 
LEMMA  2.4  Let  {g^:  E  -»■  E  }  be  continuous  uniformly  in  n.   Suppose  for  all 

X. 
n=1,2,...  there  exists  c|)  :  C  x  G  -»■  R  such  that  for  each  6  e  0,  a  compact  subset 

of  E  ,  (|)  (u),  9)  is  measurable  on  C,  and  for  each  u  e  C,  c^  (to,  0)  is  continuous  on 

—        S. 
Q.     Also  suppose  for  all  n=1,2,...,  there  exists  continuous  4j  :  0  -»■  E  such  that 

4>  (u),  6)  -  4)  (9)  -♦■  0  a.s.  as  n  ■>■  ",  uniformly  on  0.  Finally,  suppose  that  for 

all  9  E  0,  4)  (9)  is  interior  to  Y,  a  compact  subset  of  E  ,  uniformly  in  n. 

Then  g  (c^  (u,  9))  -  g  (i^  (9))  ■*■  0   a.s.  as  n  -»■  »,  uniformly  on  0. 


Given  the  continuity  of  g  uniformly  in  n  imposed  in  Assumption  A.2.i,  it 

will  suffice  that  n   I  q.(u),  0)  and  -n  converge  uniformly  on  6.   Since  ti  is 

X  1   X  ri  XI 

independent  of  6,  it  does  so  trivially.   The  uniform  convergence  of  n   J] 
q  (u,  9)  can  be  established  using  a  uniform  law  of  large  numbers.   A  variety  of 
such  laws  is  available.   Choice  of  the  appropriate  result  depends  on  the  behavior 
assimed  for  the  stochastic  process  generating  the  data.   If  the  process  is 
assumed  stationary  and  ergodic,  the  following  generalization  of  Hoadley' s  [2?] 
uniform  law  of  large  numbers  is  available. 

THEOREM  2.5:   Given  Assumptions  A.I  and  A.2.ii,  assume: 

Assumption  2.5'i:  "Pot   all  t,  q,(u3,  9)  =  q(T  o),  0)  where  T:  £2  -^  C  is  measure 
preserving  and  one-to-one,  and  there  exists  d  measurable-F  such  that  d,(a))  ■= 
d(T  uj)  and  |q.(ijj,9)|  <_  d,(ci))  for  all  9  in  G,  where  d.  is  integrable; 

Assumption  2.5'ii:  {V,}  is  a  stationary  ergodic  process  such  that  W  (uj)  = 
¥^(T*"^a)). 

Then  {n~  }],_  E[q,(aJ,  9)}}  is  equicontinuous  on  G  and  n~  y._.[q.(w,  9)  - 

X—  1     X  X*~  I    X 

E(q,(ai,  9))]   ■?■  0  as  n  -»■  =  uniformly  on  G. 

If  the  process  is  not  assvimed  to  be  stationary  but  is  instead  heterogeneous, 
then  an  extension  of  Hoadley' s  [2?]  uniform  law  of  large  numbers  due  to  Domowitz 
and  ¥hite  [l?]  is  available.   This  result  uses  the  following  definitions. 
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DEFINITION  2.6:   Measurable  fvmctions  d  :  C  -»•  E  are  r-t-6-integrable  uniformly 
in  t  if  and  only  if  E|d^^(w)  |    _<  A  <  ■=  for  all  t=1  ,2,  .. .,  i-=1  ,  ..  .,1. 


DEFINITION  2.7:   The  mixing  coefficients  ^{m)    and  Q:(m)  are  defined  as 

4.(m)  =   8up[sup,j,  ,  p,n      p-  ^  P(F>0) } '^^^1  ^)  "  ^^^^H 
n     '     -"      n+m       ' 

a(m)  =  Bup[8up|y  ^  yi   g  ^  p"»  i  |p(GF)  -  P(F)P(G)1] 
n     ^     -oj '      n+m ' 

where  IF^  =  a(...,  V  ),  f"   h  a(W   ,...).   If  4)(m)  ■=  0(m~  )  for  X  >  a  then 
-»  n    n+m      n+m 

())(m)  is  said  to  be  of  size  a ,  and  similarly  for  a(m). 

The  mixing  coefficients  (t)(m)  and  a(m)  measure  the  amount  of  dependence 
between  events  in  the  data  generation  process  separated  by  at  least  m  time 
periods.  For  more  on  mixing,  see  White  [46].   To  state  the  following  result,  we 
adopt  the  notation  IF,   =  a(V,   ,•••,  1i^.).   Theorem  2.5  of  Domowitz  and  ¥hite 

X— T        X— T         X 

[17]  can  now  be  stated. 

THEOREM  2.8:   Given  Assumptions  A.1  and  A.2.ii,  assume 

Assumption  2.8.i:  q..  is  measurable-F   ,  t  <  =■  and  there  exists  d,  measurable- 

W.        such  that  |q,  (oj,  6)  |  <  d,(a))  (element  by  element)  for  all  G  in  0,  and  for  r 

t-T  t  —    t 

>_  1  and  some  6  >  0,  d,  is  r+6  integrable  uniformly  in  t; 

Assumption  2.8.ii:   (V.  }  is  a  mixing  sequence  such  that  either  ())(m)  is  of  size 

r/(2r-l)  or  a(m)  is  of  size  r/(r-l),  r  >  1. 

Then  {n~  J]   E(q..((jJ,  6))}  is  equicontinuous  on  0  and  n~  J]^^,,  [q.  (ui,  6)  - 
E(q.(to,  e))]  ■+  Oasn-»-»  uniformly  on  0. 
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This  result  ensures  a  uniform  law  of  large  numbers  under  specific  moment  and 
memory  conditions.   Note  that  the  memory  conditions  specify  not  only  that  ^{m)    or 
a(m)  tend  to  zero  at  a  specific  rate  as  m  ->•  =>,  implying  that  the  data  generating 
process  exhibits  a  form  of  asymptotic  independence,  but  also  that  q.((i),  O)  is 
measurable  F,   ,  t  <  ",  which  implies  that  q  depends  on  only  a  finite  number  of 

t-T  t 

recent  lagged  values  of  V  .   This  ensures  that  {q.  (oj,  9)}  is  also  a  mixing 
process  (see  Theorem  3'49  of  White  [46]).   The  condition  that  t  <  »  is  a  great 
convenience,  but  not  a  necessity.   For  a  uniform  law  of  large  numbers  for 
functions  of  mixing  processes  with  1   unrestricted,  see  Gallant  and  White  [22]. 
The  machinery  required  to  allow  t  -»■'»>  is  rather  complex;   we  shall  avoid  this 
complexity  here . 

We  apply  these  results  by  imposing  the  following  assumption. 

ASSUMPTION  A. 3: 

(a. 3.1)   {q,}  and  {w  }  satisfy  either  Assumptions  2.5«i  and  2.5'ii  or 

Assumptions  2.8.i  and  2.8. ii; 

(A.3»ii)   there  exists  a  0(l)  sequence  of  non-stochastic  p-vectors  {it*}  such  that 
^      jt   s.  •  s  •  ^ 

•7c-Tt*->-0asn->-<»>. 
n    n 

Finally,  we  need  to  ensure  that  Assumption  2.2.iii  holds.  For  now  we  simply 
impose  this  directly  for  an  appropriate  choice  of  Q  .   In  Section  3  below  we 
provide  sufficient  conditions  related  to  the  familiar  rank  condition  for 
identification. 
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ASSUMPTION  A. 4:   When  it  exists,  the  function  Q  :  9  -»■  IR  defined  as 

n 

-1   ^ 


Qje)  =  g^  (n"  I     E(q^(u),e)),  TT*) 


has  an  identifiahly  unique  minimizer  9*  in  0  for  all  n  sufficiently  large. 


Vhen  the  model  is  correctly  specified,  this  condition  is  straightforward  to 
verify  using  plausible  primitive  assumptions  on  the  model  and  the  estimator. 
Even  when  the  model  is  not  correctly  specified,  plausible  primitive  conditions 
are  often  easily  available.   For  example,  in  the  case  of  linear  regression  with 
g  (4),  :i)  =  i\>,   q+(co,  e)  •=  (y^  -  X^e)  ,  it  suffices  that  E(X'X/n)  is  positive 
definite  uniformly  in  n.  The  desired  consistency  result  can  now  be  stated. 


THEOREM  2.9  (Consistency):  Given  Assumptions  A.I-A.4,  6-9*   4-   Q. 

In  contrast  to  the  consistency  results  of  Burguete,  Gallant  and  Souza  [l2j, 
Theorem  2.9  allows  for  econometric  models  with  error  terms  which  are  not  i.i.d. 
Also',  the  explanatory  variables  of  the  model  may  be  stochastic  and  contain  both 
lagged  dependent  and  explanatory  variables,  allowing  for  dynamics. 

The  consistency  of  the  least  mean  distance  and  method  of  moments  estimators 
are  established  as  separate  theorems  in  Burguete,  Gallant  and  Souza  (Theorems  2 
and  5»  respectively).   Both  types  of  estimators  are  covered  by  Theorem  2-9  by 
considering  an  optimand  of  the  form  (2.1).  We  illustrate  how  this  is 
accomplished  in  the  following  two  sections. 

3.   CONSISTENCY  OF  MAXIMUM  LIKELIHOOD  ESTIMATION 

Sufficient  conditions  for  the  consistency  of  the  maximum  likelihood 
estimator  (MLE)  of  Fisher  [l9,20]  have  been  given  for  the  case  of  a  correctly 


specified  model  by  an  impressive  list  of  authors.   Among  the  important  early 
contributions  are  those  of  Doob  [l6],  Cramer  [14],  Wald  [38,59]  and  Le  Cam  [52]. 
Of  these,  Wald  [38]  is  the  first  to  have  provided  conditions  ensuring  the 
consistency  of  the  MLE  for  generally  dependent  stochastic  processes.   Subsequent 
results  for  dependent  processes  following  Wald's  [38]  approach  have  been  given  by 
Bar-Shalom  [6],  Weiss  [40,41],  Crowder  [15]  and  Hall  and  Heyde  [24],  among 
others.  This  approach,  however,  studies  maximum  likelihood  estimators  obtained 
as  solutions  to  the  first  order  conditions  for  a  maximum  of  the  likelihood 
function.   Such  estimators  are  appropriately  treated  using  methods  illustrated  in 
the  following  section. 

In  this  section  we  treat  estimators  obtained  directly  as  the  solution  to  a 
maximization  problem,  thereby  avoiding  differentiability  assumptions.   This  is 
the  approach  taken  by  Doob  [l6],  Wald  [39]  amd  Le  Cam  [32],  although  none  of 
these  treat  the  case  of  dependent  observations  or  estimation  of  misspecified 
models.  Results  for  dependent  observations  have  been  given  by  Silvey  [37],  Bhat 
[10],  Caines  [13]  B.n&   Heijmans  and  Magnus  [26].   Results  for  misspecified  models 
with  independent  observations  have  been  given  by  Huber  [28].   The  results 
presented  here  allow  for  both  dependent  observations  and  misspecified  models,  and 
are  closely  related  to  the  results  of  White  [4?].   Since  we  allow  for  the 
possibility  of  a  misspecified  model,  we  shall  refer  to  the  maximum  likelihood 
estimator  for  a  misspecified  model  as  a  quasi-maximum  likelihood  estimator 
(QMLE). 

Given  the  results  of  the  previous  section,  very  little  effort  is  needed  to 
establish  general  conditions  ensuring  the  consistency  of  the  QMLE  for  dependent 
data  processes.   It  suffices  simply  to  choose  g  and  q  appropriately  and  to 
provide  conditions  ensuring  that  the  conditions  of  Theorem  2.9  hold  for  these 
choices. 


We  maintain  Assumption  A.I.   We  replace  Assijmption  A. 2  with  the  following 
assumption . 

ASSUMPTION  B.  1  : 

(B.1.i)   For  all  n,  e^(4't  ii)  =  -i^.  where  <|)  e  R  . 

(B.l.ii)   q+(w,  e)  •=  Jin  f,(y  |Z.,  9)  satisfies  Assumption  2.3.ii  where  for  each  9 

in  0,  f .  is  a  conditional  density  of  Y  given  Z.  ,  where  V  ■=  (l.  ,Y.)  and  Z.(to)  is 

measurable-aC . . .  V,  , «X . ) . 

A 

Since  n  is  irrelevant,  no  conditions  apply  to  it.  With  this  assumption,  the 
estimation  problem  becomes 

min   Q^(a),  9)  =  -n"'  I       Jto  f.(Y,lZ  ,  9). 
9  E  e   °  t=1      t  t  t 

Note  that  the  possibility  for  misspecification  arises  because  nothing  acts  to 
guarantee  that  the  conditional  density  of  Y.  given  Z,  implied  by  Assumption  A.1 
is  given  by  f  (Y, |Z.,  9  )  for  some  9  in  0.   Also  note  that  Z.  can  be  any  subset 

of  the  vector  {\!.,...,   V.  .,   J..)'      If  Z.  is  measurable  with  respect  to  a  sub  a- 

algebra  of  a(W> ,  •  • . ,  VJ,  .,   J..)   the  model  specified  may  also  represent  a  dynamic 
misspecification  in  the  sense  that  the  true  conditional  density  of  Y  given  X 

X  X 

and  past  W.'s  may  well  involve  all  these  variables. 

When  V  is  an  independent  sequence  and  Z  =  ^+»  "the  estimator  under 
consideration  is  the  conditional  maximum  likelihood  estimator  of  Andersen  [5]» 
Andersen  recommends  use  of  this  estimator  in  situations  where  one  is  conditioning 
on  sufficient  statistics  for  incidental  parameters.  The  present  formulation 
allows  a  generalization  of  Andersen's  approach  to  the  case  of  dependent 
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observations.   Note  that  I  may  be  null,  so  that  Y  •=  V  ,  allowing  for 
unconditional  maximum  likelihood  as  well. 

The  existence  of  the  quasi-maximum  likelihood  estimator  can  be  established 
immediately. 

COROLLARY  3•^'^      Given  Assumptions  A.1  and  B.  1  ,  there  exists  a  measurable  function 
Gq^jj^Coj)  such  that 

1   n 
Q  (u),  e   (u)))  =  sup  n    I       hi   f,(Y  |Z  ,  G). 
-  °     ^"^      e  e  e     t=1      X  t  t 

To  obtain  consistency,  we  maintain  the  relevant  part  of  Assumption  A. 3  (i.e. 
A.5-i)  ^'^^   maintain  Assumption  A. 4.  Without  further  structure,  9*  of  Assumption 
A. 4  must  simply  be  interpreted  as  the  parameter  value  which  minimizes 

-         1   ^ 

Q^(e)  =  -n"  I    E(Jto  f^(Y^|z^,  e)). 

Under  additional  mild  conditions,  White  [48,  chapter  4I  shows  that  9*  can  be 
given  an  information  theoretic  interpretation  in  terms  of  the  Kullback-Leibler 
[31]  Information  Criterion,  in  a  manner  analogous  to  that  provided  by  Akaike  [1] 
and  White  [45]  for  the  case  of  independent  observations.   White  [48,  chapter  5] 
also  provides  more  primitive  conditions  on  the  model  which  help  ensure  that 
Assumption  A. 4  is  satisfied. 

It  should  be  pointed  out  that  when  W.  is  heterogeneous  and  the  mixing 
conditions  are  utilized,  then  maintaining  Assumption  A.5-i  niay  force  a  dynamic 
misspecification  since  f+(Y. ]Z, ,  6)  is  required  by  A.3.i  to  depend  only  on  a 

finite  number  of  lagged  W, 's,  whereas  a  correct  specification  might  require 
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f  (Y. |Z, ,  6)  to  depend  on  all  past  V  'b.   Thus,  while  correct  specification  for 
t  t  t  "t 

certain  heterogenous  Markov  processes  is  allowed,  correct  specification  for 

heterogeneous  non-Markov  processes  is  not  necessarily  allowed.   Stationary 

ergodic  non-Markov  processes  are  correctly  specificable  under  A.J.i,  however. 

Results  for  hetereogeneous  non-Markov  mixing  processes  are  covered  by  results  of 

Gallant  and  liJhite  [22]. 

The  consistency  result  is 

COROLLAEY  3.2:   Given  Assumptions  A.I,  B.I,  A.3'i  and  A. 4,  e„„^  -  9*  ^'-^   0. 

WhL    n 

Recently,  Levine  [33]  has  shown  that  maximum  likelihood  estimators  can  be 
consistent  despite  the  presence  of  dynamic  misspecification.  The  key  condition 
is  as  follows. 

ASSUMPTION  B.2:   The  conditional  density  of  Y,  given  Z.  implied  by  Assumption 
A.1  is  given  by  f.(Y  Jz.  ,  6°)  for  some  9°  in  0,  t=1 ,2, . . .   . 


COROLLARY  3-3:   Given  Assumptions  A.1,  B.I,  A.3.i,  A. 4  and  B.2,  6    ^-5.^  6°. 


Note  that  A. 4  is  still  reqxxired  even  in  the  presence  of  B.2. 

To  give  some  additional  insight  into  the  content  of  Corollary  3-3»  we 
compare  it  with  the  results  of  Crowder  [l5]  for  consistency  of  the  maximum 
likelihood  estimator.  The  first  distinction  to  be  drawn  is  that  the  present 
results  apply  to  conditional  maximum  likelihood  estimators  with  the  possible 
presence  of  dynamic  misspecification.   Crowder' s  results  apply  to  correctly 


specified  imconditional  maximum  likelihood  estimators.   Next,  no  differen- 
tiability assumptions  are  imposed  here  in  obtaining  the  present  consistency 
results;  Crowder's  conditions  require  that  the  likelihood  function  be 
continuously  differentiable  of  order  two. 

Crowder  allows  his  likelihood  function  for  a  given  observation  to  depend  on 
all  previous  observations.   We  explicitly  allow  this  for  stationary  ergodic 
processes  and  explicitly  rule  this  out  for  heterogeneous  mixing  processes  on 
grounds  of  convenience.   The  extension  to  allow  dependence  on  all  previous 
observations  for  heterogeneous  mixing  processes  is  non- trivial,  and  this  issue  is 
not  explicitly  addressed  by  Crowder.   For  this  extension,  see  Gallant  and  White 
[22].   Crowder  does  consider  an  example  involving  a  stationary  mixing  process 
(hence  an  ergodic  process).   The  conditions  imposed  on  the  mixing  coefficients 
here  are  weaker  than  those  imposed  by  Crowder  in  his  example. 

The  area  in  which  Crowder's  results  do  allow  considerably  greater 
flexibility  is  that  Crowder  does  not  impose  domination  conditions,  as  we  do  here. 
Thus,  Crowder's  results  apply  to  such  situations  as  ordinary  least  squares 
applied  to  a  linear  model  with  a  time  trend.   This  situation  is  not  covered  by 
our  results.   Such  situations  can  be  covered  in  a  framework  closely  related  to 
that  adopted  here  by  dispensing  with  Assirnptions  2.2.ii  and  2.2.iii,  and  replacng 
these  with  weaker  conditions,  as  in  Wooldridge  [!50l. 

Finally,  Crowder  provides  an  elegant  treatment  of  the  incidental  parameters 

1' 

problem.  We  do  not  address  that  issue  here. 


4.   CONSISTENCY  OF  M-ESTIMATORS  AND  GENERALIZED  METHOD  OF  MOMENTS  ESTIMATORS 

An  extremely  useful  estimation  technique  introduced  by  Huber  [28]  arises 
when  the  assumption  that 


(I'jjCe*)  =  n     I  E((i^(aj,  e*))  *  0  as  n  *  » 
t=1 

for  some  sequence  {9*}  in  9  is  used  to  construct  an  estimator  B„(a))  which 

satisfies 

t=1 
Such  estimators  are  called  m-estimators.   (in  Huber's  i.i.d.  setup,  6*  is 
independent  of  n.   Here,  it  is  natural  to  allow  0*  to  depend  on  n.)   In  certain 
cases,  such  an  estimator  may  be  obtained  directly  as  the  solution  to  the  problem 

J  (e)  =  0. 
The  maximum  likelihood  estimator  occurs  as  the  special  case  for  which 
(liCto,  6)  =  V  Jin  f,(Y,  Iz,  ,  e) ,  provided  that  the  log- likelihood  fvmction  is 

t  b      X   X   "C 

dif f erentiable .  The  instrumental  variables  estimator  of  Reiser(t)l  [54]  and  Geary 
[25]  occurs  as  the  special  case  in  which  q.(u),  9)  =  Z' (Y,  -  X,9),  where  now 
W.  =  (l.,  Y.,  Z.)  and  Z,  is  a  vector  of  the  same  dimension  as  X,. 

In  other  cases,  no  such  solution  will  ezist  (e.g.',  when  q,  is  of  greater 
dimension  than  9),  but  regardless  of  whether  it  does  or  not,  an  estimator  can  be 
obtained  by  finding  the  value  9„((i))  which  makes  4)  (9)  as  close  to  zero  as 
possible.  A  common  measure  of  closeness  is  a  metric  for  the  space  in  which 

A 

4;  (9)  takes  it  values.  However,  a  metric  has  properties  irrelevant  for  present 
purposes.  Here  it  suffices  to  measure  closeness  in  the  following  way. 


DEFIITITION  4.I:  A  function  I  :  E  x  R  ->■  R  is  called  a  discreijancy  from  zero  if 

n  ' ' 

0 

and  only  if  I)  (4;,  O)  2.  0  for  all  t|)  in  E  and  D  (4^,  O)  =  0  if  and  only  if  4.  =  0. 

For  simplicity,  D     satisfying  this  definition  will  be  called  simply  a 
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A 

"discrepancy."   In  general,  we  could  then  choose  6j^,(u))  to  solve  the  problem 

min  r  ((ji  (e),  O). 

e  e  e 

However,  in  practice,  one  might  wish  to  choose  D     to  depend  implicitly  on  unknown 
parameters  n*.      This  dependence  can  be  made  explicit  by  writing 

g^C^Je),  u*  )  -\ajB),   0). 

We  obtain  an  estimator  as  the  solution  to  the  problem 

min  Q^(u),  e)  =  g^ii^ie),   %^)   =  L^((^^(e),  0), 
6  e  0 

where  t:  is  a  consistent  estimator  for  it*.   The  theory  of  Section  2  applies 
n  n 

immediately  to  such  estimators.   Existence  follows  from  Corollary  2.3  and 

consistency  from  Theorem  2.9*  Note  that  9*  is  defined  as  the  parameter  vector 

minimizing  D  (d;  (B),  O)  in  A. 4*   Because  it  is  not  reauired  that  D  ( d)  (9*),  O) 
^  n  ^n   •"   '  -  n  ^n  n 

attain  the  value  zero,  the  equations  (J'jiCs)  =  0  which  motivate  the  m-estimator  are 

allowed  to  be  misspecified. 

For  the  case  in  which  4)  (0*)  =  0,  or  more  generally  when  <^„(  9!^)  -»•  0  as  n  -^-  <= 

for  some  sequence  {9*  }",  a  condition  closely  related  to  the  rank  condition  for 

the  identification  of  the  parameters  of  systems  of  simultaneous  equations  can  be 

shown  to  assist  in  verifying  Assumption  A. 4.   In  the  standard  linear  simultaneous 

equations  model  identification  of  the  true  parameter  vector  6  relies  on 

establishing  that  the  familiar  rank  and  order  conditions  are  satisfied  (see  e.g., 

Fisher  [is]).  These  assumptions  ensure  that  distinct  parameter  values  give  rise 

l;'-._„to  distinct  values  for  the  function  cL",  i.e.',  if  this  function  is  not  one-to-one 

^n 

at  least  in  a  neighborhood  of  the  true  parameter  vector,  then  the  true  parameter 
vector  is  not  identifiable.** 
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For  the  linear  case,  a  necessary  condition  for  4;  (9)  to  be  one-to-one  is  the 
familiar  order  condition.   This  guarantees  that  the  range  space  is  "large"  enough 
for  i\>   (6)  to  be  one-to-one.   Clearly,  this  is  a  necessary  condition  for 
identifiability  in  any  more  general  case,  regardless  of  whether  or  not  the 
probability  model  is  correctly  specified.  The  following  definition  is 
appropriate  for  the  present  case. 


DEFINITION  4.2:   Let  {4.  :  0  -^  E  }  be  a  sequence  of  functions.   {cb  }  satisfies  the 

n  n 

generalized  order  condition  if  and  only  if  dim(©)  =  k  <    i. 


The  rank  condition  of  the  linear  case  is  both  necessary  and  sufficient  for 
identification  of  the  parameters  of  a  correctly  specified  model.  When  it  is 
satisfied,  the  image  of  the  parameter  space  \mder  the  linear  fvmction  is  of 
dimension  k  and  there  exists  a  one-to-one,  onto  mapping  between  the  parameter 
space  and  a  subset  of  the  range  space  of  (\>   • 

To  establish  identifiability  in  the  present  context,  a  condition  slightly 
stronger  than  the  one-to-oneness  of  the  individual  functions  <\}    is  required. 
This  is  because  the  one-to-oneness  may  vanish  in  the  limit  even  though  it  is 
satisfied  for  each  of  the  individual  functions.  This  could  make  identifiability 
impossible  in  the  limit.  Sequences  which  satisfy  the  following  definition  will 
avoid  this  problem.  Furthermore,  in  the  standard  linear  simultaneous  equations 
model,  it  guarantees  that  the  standard  rank  and  order  conditions  are  satisfied. 

Let  H   H  denote  the  standard  Euclidean  norm. 
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—       s. 

DEFINITION  4-3:   Let  {4^  :  6  ->•  E  }  be  a  sequence  of  functions,  and  let  G*  c  r\cQ 

n 
n  =^  1  , 2,  .  ••    •   '^4' J,}  satisfies  the  generalized  rank  condition  at  {P*  }  on  ti  if 

and  only  if  for  every  e  >  0  there  exists  6  >  0  such  that  for  any  6  e  t]  for  which 
B  e  -  e*  I  >  E  it  follows  that  ll(l'(e)-4i(0*)l>6  for  all  n  sufficiently 
large . 


Although  it  can  be  shown  that  having  the  generalized  rank  condition  hold  on 
an  appropriate  set  t)  is  necessary  for  Assumption  A. 4  to  be  satisfied,  it  is  not 
sufficient.   This  is  because  even  though  true  parameters  for  a  correctly 
specified  model  may  be  identified,  the  estimator  defined  by  choice  of  V     may  fail 
to  have  a  unique  limit ,  and  therefore  fail  to  be  identifiable . 

In  the  present  context,  we  control  the  behavior  of  the  discrepancies  D 
so  that  the  limiting  discrepancy  from  zero  of  a  vector  is  bounded  away  from  zero 
except  when  the  vector  is  zero.   The  following  definition  guarantees  this  and 
somewhat  more . 


DEFINITION  4-4:   Let  {Ti   }   be  a  sequence  of  discrepancies.   {D  }  is  an 
asymptotically  uniform  sequence  of  discrepancies  if  and  only  if  there  exists  a 
monotone  transformation  h  continuous  at  zero  with  h(o)  =  0  and  constants  6  >  0 
and  A  <  "  such  that  for  any  cj;  e  E  ,  b\l(^l    <_   h(D  (4;,  O) )  _<  An(|>II  for  all  n 
sufficiently  .large. 


The  following  result  shows  that  by  imposing  the  generalized  rank  condition 
and  by  choosing  an  asymptotically  uniform  sequence  {D  }  to  define  an  estimator, 
one  can  verify  the  identifiable  uniqueness  Assumption  A.4«  .  . 


THEOREM  4.5:   Suppose  that 

Assumption  4-5'i:   {I'jj(4'.  O)  =  E^i<\',   i^*)}  is  an  asymptotically  uniform  sequence 

of  discrepancies; 

Assumption  4.5-ii5   There  ezists  a  sequence  {8*}  in  6,  a  compact  subset  of  R  , 

such  that  4'n(6*)  -^  0  as  n  -^  »; 

Assumption  4-5'iii:   There  exist  a  constant  p  >  0  and  integer  N  <  <«>  such  that 


{(|i  }  satisfies  the  generalized  rank  condition  at  {9*}  on  G(p,  N  )  H  [g  e  0 
D  (d.  (e),  0)  <  p  for  some  n  >  K  1. 


Then  9*  is  an  identifiably  unique  minimizer  of 


By  showing  how  Assumption  A. 4  can  be  verified  when  the  probability  model  is 
correctly  specified  in  the  sense  of  Assumption  4.5«ii,  this  result  provides 
useful  additional  insight  into  the  meaning  of  Assumption  A.4. 

¥e  illustrate  the  use  of  this  result  and  those  of  the  Section  2  by  providing 
a  consistency  result  for  the  generalized  method  of  moments  (GMM)  estimator  of 
Hansen  [25]»   Among  other  things,  this  estimator  is  useful  in  estimating  the 
parameters  of  the  implicit  nonlinear  system  of  simultaneous  equations 

U^  =  r^(w,  6°),   t=1,...,n 
when  it  is  known  that  there  exists 


such  that 


Z  =  G. (u,  6  ),   t=1, ...,n 


E(u^  e  z^)  =  0. 


Letting 

q^(ijJ,  e)  =  F+Coj,  9)  e  G^(u,  9), 


it  follows  that 

1(6°)  =  n-^  I     E(q.(a),  6°))  -  0. 
t=1 

The  GMM  estimator  is  obtained  by  making  the  choices  formalized  in  the  following 

assumption. 

ASSUMPTION  C .  1  : 

(C.1.i):  For  all  h,  g  (4j,  ii)  ■=  (l^'PtJ',  where  (^  e  R  and  P  is  a  symmetric  positive 

semi-definite  matrix  such  that  -ji   ■=  vech  P; 

(C.l.ii):  q.((i),  6)  satisfies  Assumption  A.2.ii; 

A 

(C.l.iii):  there  exists  {P  (lo)},  a  sequence  of  symmetric  positive  semi-definite 
matrices  such  that  %   (w)  =  vech  P  ((j)  satisfies  Assumption  A.2.iii. 


Given  these  choices,  the  GM  estimator  solves  the  problem 


min   (  n"^  I     q.(a),  e))'?(u>)    [n  I     q  ( w,  9)) 

Gee      t=1   ^        ^        t=1   ^ 


Existence  of  the  GM  estimator  follows  immediately  from  Corollary  2.3. 


COROLLARY  4.6:   Given  Assvunptions  A.1  and  C.I,  there  exists  a  measurable  function 

A 

function  6_„„(io)  such  that 

^n^"^'  ®GMM^"^^  =  ^^-    t^'""  I     'It^"'  e))'P  (u))  [n'^  I     q  ( ci,  6)) 
^^^       e  E  G       t=1   ^        ^        t=1   ^ 


To  obtain  consistency,  we  impose  the  following  assumptions. 


''? 


ASSUMPTION  C.2: 

(C.2.i):   Assumption  A.3.i  holds  for  {q  }  and  {V.}. 

(C.2.ii)  There  exists  a  0(l)  sequence  of  non-stochastic  symmetric  positive  and 

semi-definite  matrices  P*  such  that  P  -P*   ■+•  Oaen-*-". 

n  n    n 

Given  Assumption  C.2.ii,  we  can  identify  the  discrepancy  measure  as 
Note  that  D  (cJj  ,  4)_)  '   is  a  weighted  Euclidean  norm.   Thus, 

eX^e),  u*)  =  DjTje),  o) 

where  ir*  =  vech  P*. 

n        n 


ASSUMPTION  C.3: 

(C.5«i)   {?*}  is  uniformly  positive  definite; 

(C.3«ii)  there  exists  G  in  9  such  that  n~  J|a_.^E(q_^((i),  6  ))  ^  0  as  n  ->•  =>; 

(C.3«iii)  there  exists  a  constant  p  >  0  and  an  integer  N  <  »  such  that 

{{))  (6)  =  n~  I^„1E(q,(a),  6))}  satisfies  the  generalized  rank  condition  at  6  on 

G(p,  N  )  as  defined  in  Assumption  4.5'iii' 


The  consistency  result  for  the  GMM  estimator  i  s 


?3 


COROLLARY  4-7:   Given  Assumptions  A.1,  C.I,  C.2  and  C.3,  6^^^^  -5-  9  . 

This  result  is  broadly  similar  to  Theorem  2.1  of  Hansen  [25 J,  although  it  differs 
in  certain  particulars.  First,  Hansen  requires  {V  }  to  be  strictly  stationary 
and  ergodic.   Although  we  allow  this  possibility,  we  also  allow  for  heterogeeous 
processes.   Next,  Hansen  only  requires  (0,  a)  to  be  a  separable  metric  space  with 
9  locally  compact.   Our  assumption  that  0  is  a  compact  subset  of  a  finite 
dimensional  Euclidean  space  suffices  for  this.   Although  this  is  a  commonly 
encoxintered  situation  in  econometrics,  the  main  result  of  Theorem  2.9  is  easily 
extended  to  allow  (0,  o)  to  be  a  separable  metric  space. 

Hansen  requires  the  function  q  to  be  time-invariant,  but  we  do  not. 
Similarly,  Hansen  requires  P*  to  converge  to  a  constant  limit.  This  requirement 
also  is  not  imposed  here.  These  are  natural  restrictions  for  Hansen  since  his 
data  are  assumed  stationary.  Here  we  require  somewhat  greater  flexibility. 

In  place  of  our  domination  conditions',  Hansen  imposes  a  super-continuity 
requirement  (see  Definition  2.2,  Hansen  [25]).   'While  this  condition  appears 
somewhat  weaker,  it  may  also  be  somewhat  more  difficult  to  verify.   Finally, 
Hansen  directly  assumes  identification  (hence  identifiability) .  We  provide 
useful  sufficient  conditions  related  to  the  familiar  rank  condition  as  embodied 
in  Assumption  C.3'  With  stationarity,  it  is  natural  to  assume  for  identification 
that  E(q,  (u),  0  ))  =  0  for  all  t=1,2,...   as  Hansen  does.   Here  we  adopt  the 
weaker  requrement  that  n~  Ij._.  E(q,  (to,  B))^Oasn->-«i'. 

X*~  1     X 

Despite  these  differences,  the  present  result  is  clearly  identical  in  spirit 
to  Hansen's  result.   In  fact,  for  the  consistency  property,  it  represents  a 
useful  version  of  the  extension  to  the  nonstationary  context  which  Hansen  [25] 
suggests  in  his  concluding  remarks. 


le   Hansen  notes,  his  results  cover  the  nonlinear  instrumental  variables 
estimators  of  Amemiya  [',4],  Jorgensen  and  Laffont  [30]  and  Gallant  [22].   The 
present  result  also  covers  the  nonlinear  least  squares  estimators  of  ¥tiite  [43] 
and  V/hite  and  Domowitz  [49]* 

5.   CONCLUSION 

This  paper  has  presented  a  theory  of  consistent  estimation  for  parametric 
models  which  bridges  the  gap  between  the  general  consistency  result  of  Domowitz 
and  White  [17]  and  the  myriad  of  single  cases  characterizing  much  of  the 
econometrics  literature,  and  which  extends  the  unification  of  Burguete,  Gallant 
and  Souza  to  the  non-i.i.d.  case.  Two  special  cases,  ( quasi- )mazimura  likelihood 
and  generalized  method  of  moments  estimators,  are  discussed  in  some  detail. 
These  examples  by  no  means  exhaust  the  possible  applications  of  the  present 
theory.  For  example,  minimum  chi-square  estimation  as  discussed  by  Rothenberg 
[36,  p.  24]  can  be  treated  by  setting  q.(ijj,  6)  =  9,  %  "   (9  ,  vech  avar  9  )  and 
defining 

g   fn"''   y  q,(u),  e),    %  ]  =      (6  -  e)'[avar  9  ]~^    (9  -  9) 
°n  *•     f;.  ^t  '      n-^      n      '-      n-^     n 

where  0  =  {9  e  9*:  h(9)  =  0},  and  9*  is  a  compact  subset  of  a  finite  dimensional 
Euclidean  space. 


•^^ 


The  present  approach  also  provides  a  convenient  framework  for  establishing 
asymptotic  normality  results,  and  for  selecting  the  most  efficient  estimator 
within  any  subclass  of  the  estimators  covered  by  this  theory.   These  topics  are 
addressed  in  a  subsequent  paper  (Bates  and  White  [v])* 
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MATHEMATICAL  APPENDIX 

a       V 

PROOF  OF  THEOREM  2.5:   As  g  is  continuous  on  R  x  R^,  it  follows  from  Theorem 
13.3  of  Hllingsley  [l979]  that  Q^(a),  6)  =  gjn"^  I^^^q^(  w,  6),  it^(u)))  is 

measurable  for  each  6  given  the  measurability  of  q.  and  it  .   As  g  is  continuous 

X        Xi         XI 

and  q,  is  continuous  on  6  for  each  w,  it  follows  that  Q  (u,  G)  is  continuous  on  ( 

for  each  w.   By  Assumption  A.2.ii,  0  is  a  compact  subset  of  R  .   Thus  Assumption 
2.2.i  is  satisfied  and  the  existence  of  6  (u))  follows  from  Theorem  2.2,  given 
Assumption  A.I. 

PROOF  OF  LEMMA  2.4:  As  {g  }  is  continuous  uniformly  in  n,  given  e  >  C,  there 

exists  5(e)  >  0  such  that  if  z,  y  e  $,  a  compact  set,  and  II  x-j   H  <  6(£;),  then 
1  gj^Cx)  -  g^(y)  n  <  e. 


For  any  e  >  0,  define  6(e)  as  immediately  above.   Since  (Jj  ((j[),G)  -  c|j  (g) 


a.s 


4-  0  tmiformly  on  9,  there  exists  F  in  F,  P(F)  =  1  such  that  for  (any)  6(e)  >  0 
and  each  w  in  F,  there  exists  an  integer  K  (u,  6(e))  such  that  for  all  n  > 
N^(u),  6(e)) 


sup  I  (^  (u,  G)  -  4;  (G)  11  <  6(e). 
G  E  e    ^      ■  ^ 


Further,  since  4>  (6)  is  interior  to  $  uniformly  in  n  for  all  G  in  0,  for  each 


(1) 


in  F  there  exists  R.(ijo)  such  that  for  all  n  >  N.(w)  i\>   (oj,  G)  lies  interior  to  $ 


for  all  6  in  9.   Thus  for  each  o)  in  F  there  exists  K  (co,  e)  •=  max  (N  (w,  6(c)), 

I  o 

N  (u)))  such  that  for  all  n  >  N„(w,  e)   4>  (u,  G)  and  cj-  (9)  are  in  f>  and  i  4.  (oj,  6) 

-  (L  (G)  i  <  6(e)  for  all  G  in  0.   It  follows  from  the  continuity  of  {g  } 
^n  n 

uniformly  in  n  that  for  each  u  in  F   H  g  (41  (ui,  6))  -  £  ((!'  (G))  II  <  e  for  all  n  > 
N  (u),  e)  and  all  G  in  0.   Since  e  is  arbitrary  and  P(f)  =  1,  it  follows  that 
g  (cJ)  (w,  G))  -  g  ((^  (G))  *  0  a.s.  as  n  •<-  "o  imiformly  on  0. 

•PROOF  OF  THEOREM  2.5:  The  proof  is  identical  to  that  of  Jennrich's  [29]  Theorem 
2,  except  that  the  Ergodic  Theorem  (e.g.  White  [46,  Theorem  3-34])  is  applied  in 
place  of  Komolgorov's  law  of  large  numbers  for  i.i.d.  random  variables. 


PROOF  OF  THEOREM  2.9:   We  verify  the  conditions  of  Theorem  2.2.  Given  Assumptions 

A 

A.1  and  A. 2,  Assumption  2.2.ii  is  satisfied  and  G  (u)  exists  by  Corollary  2.3. 

Given  Assumption  A.5«i  it  follows  from  either  Theorem  2.5  or  Theorem  2.8  that 

n~  2  ._.  [q..  .  (liJ,  G)  -  E(q.  .  (cjj,  G))]  •5-  0  uniformly  on  0.   By  Assumption  A.3-ii, 

Tc  (co)  -  ir*   -^  *  0.   Identifying  (n~  E    ^A^,    G)  '  ,  ii  (w)')  with  4;  (o),  G)  and 
xi.        ij.  L^  I    x>  xj.  XI 

'  (n"''z^^  E(<i^^(a),  G))',  i^)  with  4^^(G),  it  follows  that  4,^(01,  G)  -  ^^^(g)  ^-^-^'O 

uniformly  on  0.   The  domination  conditions  on  q_^  ensure  that  E(q^  (a),  G))  and 

therefore  n  E  .E(q,(a),  G))  are  0(l)  uniformly  in  G;  by  A.3«ii,  n^  is  0(l). 
x^  IX  n 

Therefore,  4;  (G)  is  interior  to  a  compact  subset  of  R    uniformly  in  n.   As  {g  } 

is  continuous  uniformly  in  n  it  follows  from  Lemma  2.4  that  g  (4;  (ui,  G))  - 

n  n 

gj^(4'^(G))   ■*■  *0  as  n  -»•  =>  uniformly  on  G,  i.e.  Ql(w,  G)  -  CL(G)   ^  *  0  as 

n  -►  ",  setting  Qj^(cj,  G)  =  g^(4'  (w,  G))  and  Q  (G)  =  g  (4;  ( G) ) .   Hence  Assumption 


•^R 


2.2. -LI   holds. 

Given  Assumption  A. 4,  Assumption  2.2.iii  is  satisfied.   Thus,  the  conditions 
of  Theorem  2.2  are  satisfied,  so  that  6  (w)  -  9*  ^  0. 


PROOF  OF  COROLLARY  3-1:   The  result  follows  by  verifying  that  Assumption  B.1 
implies  Assumption  A. 2  so  that  the  conditions  of  Theorem  2.3  hold.   This  is 
trivial  because  g  {^,   it)  ■=  -4)  is  obviously  continuously  uniformly  in  h, 
Assumption  A.2.ii  holds  by  Assumption  B.l.ii,  and  Assumption  A.2.iii  is 
irrelevant. 


PROOF  OF  COROLLARY  3-2:   That  e_„^  -  9*  ^4-^*0  follows  immediately  from  Theorem 
2.9  since  Assumption  B.1  implies  A. 2,  and  the  other  conditions  are  assumed 
directly. 

PROOF  OF  COROLLARY  3.3:   See  White  [48],  Theorem  4.7,  for  the  demonstration  that 

e*=e°. 

n 


PROOF  OF  THEOREM  4-5:   Let  t)  =  S  (E;)ne  as  in  Definition  2.1.   For  some  e  >  0 
suppose  that 

lim  inf  [  min  ^  ^^'^n^®^'  °^  ~  V^n^^^'  °^^  =  ^ 
n   e  e  Ti^ 

so  that  0*  is  not  identifiably  unique.  Because  D  (4)  (0*),O)  -»•  0  given 
4>  (e*)  •»■  0  (Assumption  4.5.ii)  and  D  (c^  (9*),  O)  <  h  (A  H  4^  (9*)ll)  (Assumption 
4.5.i),  and  because  inf-s,-  [min  o  ^  c  I'j,(4'  (6),  O)]  is  monotone  increasing  in  m 
we  have 


•^o 


inf  ^   Fmin  „    c  D  (4.  (e),  O)]  ■=  0. 
n  >_in  "    Seti    nn 

Thus  for  any  ^  >  0  and  any  M  <  <»  there  exists  n  >  K  such  that 

niin      c  D  (((>  (6),  O)  <  Z'      Choosing  E  <  p,  it  follows  that  there  exists  a 
6  e  Ti   n  n 

sequence  {6  }  with  0  in  ti   (so  that  89  -  9*  fi  >  e  for  all  n)  such  that  for  any 
M  <  <»  there  exists  n  >  M  for  which  I)  (4;  (0  ),  O)  <  ^  <  p.   This  implies  there 
exists  n  >  M  such  that  9  is  also  in  9(p,  M)  for  any  M  <  <».   Since 
D   {<\)   (0*),  0)  -*■  0,    for  any  ^  >  0  there  exists  M(  ^)  sufficiently  large  so  that  for 

all  n  >  M(^),  D   {(\>   (9*),  0)  <  Z,.      Further,  since  {D   }  is  an  asymptotically 

-,-..  uniform  sequence  of  discrepancies  by  Assumption  4-5»i»  it  follows  that  there 

exist  6  >  0  and  11^  <  »  such  that  for  all  n  >  K^  ,  6ll(|.^(  9^)  II  <_  h(D^((J;^(  9*)  ,0))  and 
6114)  (6  )D  <_  h(l)  (4>  (9  ),  0)).   Given  U  as  specified  in  Assumption  4.5«iii,  it 
follows  that  for  any  ^  such  that  0  <  ^  <  p  and  some  n  >  max(M(^),  R  ,  H  ) 
6ll?^(9j!l  +  6I14;'^(6*)1I  £  hCD^U^C?^),  0))  +  h(D^(4^^(^),  0)) 

<  2h(|). 
By  the  triangle  inequality 

n  4I  (9  )  -  4^  (9*)ll  <  114^  (9  )ll  +  114^  (9*)ll 
n  n    ^n  n   —   n  n      ^n  n 

<  2h(0/6 
for  some  n  >  max(M(^),  N  ,  R.  )  and  any  Z,   such  that  0  <  ^  <  p,  where 

»w  ■       »^  c 

6  e  0(p,  N  ).  However,  since  9  is  also  in  -n  ,  and  since  h  is  continuous  at 
no  n  n 

zero  this  violates  Assimption  4-5'iii  which  requires  the  generalized  rank 


^0 


condition  to  hold  on  ©(p,  N  ),  a  contradiction.  Hence  P*"  must  be  identifiably 

o  n  *^ 

unique . 


PROOF  OF  COROLLARY  4.6:   The  result  follows  by  verifying  that  Assumption  C.1 
implies  Assumption  A. 2  so  that  the  conditions  of  Theorem  2.3  hold.   This  is 
trivial  because  g  {(]>,   ii)  •=  4''P4'i  where  tu  ■=  vech  P,  is  obviously  continuous 
uniformly  in  n,  Assumption  A.2.ii  holds  by  Assumption  C.l.ii,  and  Assumption 
A.2.iii  holds  by  Assumption  C.l.iii. 

PROOF  OF  COROLLARY  4.7:   We  verify  the  conditions  of  Theorem  2.1.   Assumption  A.1 
is  given,  Assumption  C.1  implies  Assumption  A. 2  as  just  argued,  Assumption  C.2 
directly  implies  Assumption  A.3»  and  Assumption  A. 4  is  satisfied  for  0*  =  6  as  a 
consequence  of  Theorem  A-5,   given  Assumption  C.3.  We  note  that  Assumptions  C.1.i 
and  C.^.i  ensure  that 

■Dj,i\>,   0)  =  (l.*3^(|. 

is  an  asymptotically  uniform  sequence  of  discrepancies. 


z-t 


Footnotes 

^The  authors  are  grateful  to  A.  Ronald  Gallant,  Jeffrey  Wooldridge,  the 
editor,  and  especially  to  an  anonymous  referee  for  helpful  comments  and 
suggestions.   Any  remaining  errors  are  attributable  to  the  authors.   This  work 
was  supported  by  NSF  grant  SES83-0655- 


^For  Q  :  0  •*  E,  let  A  =  {(6,  x)  e  0  x  e  :  0  (e)  <  x}.   Then  A  is  the 
n  n      '  n    —  n 

epigraph  of  Q  (Roberts  and  Varberg  [35»  p-  80l). 


^The  operator  vech  maps  the  lower  triangle  of  a  symmetric  JL  x  Jl  matrix  into 
an  ZiX  +   1 )/2  x  1  column  vector. 

'*The  distinction  between  identification  and  identifiability  previously 
discussed  should  be  carefully  borne  in  mind  here  and  in  what  follows. 


42 


References 


[l]  Akaike,  H.   Information  Theory  and  an  Extension  of  the  Likelihood  Principle. 

In  B.  K.  Petrov  and  F.  Csdki  (eds.)»  Proceedings  of  the  Second 

International  Symposium  on  Information  Theory.   Budapest:  Akademiai 

Kiado,  1973- 
[2]  Amemiya,  T.   Regression  Analysis  When  the  Dependent  Variable  is  Truncated 

Normal.   Econometrica  41  (1973),  997-1016. 
[3]  Amemiya,  T.   The  Nonlinear  Two-Stage  Least-Squares  Estimator.   Journal  of 

Econometrics  2  (1974),  105-110. 
[4]  Amemiya,  T.   The  Maximum  Likelihood  and  Nonlinear  Three  Stage  Least  Squares 

Estimator  in  the  General  Nonlinear  Simultaneous  Equations  Model. 

Econometrica  45  (1977),  955-968. 
[5]  Andersen,  E.  B.  Asymptotic  Properties  of  Conditional  Maximum  Likelihood 

Estimators.   Journal  of  the  Royal  Statistical  Society,  Series  B  32 

(1970)  283-301. 
[6]  Bar-Shalom,  Y.   On  the  Asymptotic  Properties  of  the  Maximum  Likelihood 

Estimate  Obtained  from  Dependent  Observations.   Journal  of  the  Royal 

Statistical  Society,  Series  B  33  (1971),  72-77. 
[7]  Bates,  C.  and  H.  White.   Efficient  Estimation  of  Parametric  Models.   Johns 

Hopkins  Department  of  Political  Economy  Discussion  Paper  (1984)« 
[s]  Berk,  H.  Limiting  Behavior  of  Posterior  Distributions  When  the  Model  Is 

Incorrect.  Annals  of  Mathematical  Statistics  37  (l966),  51 -58i 
[9]  Berk,  H.   Consistency  A  Priori.  Annals  of  Mathematical  Statistics  4I 

(1970),  894-906. 
[10]  Bhat,  B.  R.  On  the  Method  of  Maximum  Likelihood  for  Dependent  Observations. 

Journal  of  the  Royal  Statistical  Society,  Series  B  56  (1974),  48-53' 


4'=^ 


[l 1 ]  Billings ley,  P.   Probability  and  Measure.   New  York:   Wiley,  1979« 
[12]  Burguete,  J.,  A.  R.  Gallant  and  G.  Souza.   On  the  Unification  of  the 

Asymptotic  Theory  of  Nonlinear  Econometric  Models.   Econometric  Reviews 
1  (1982),  151-212. 
[13]  Gaines,  P.   A  Note  on  the  Consistency  of  Maximum  Likelihood  Estimates 

for  Finite  Families  of  Stochastic  Processes.   The  Annals  of  Statistics 
3  (1975),  559-546. 
[14]  Cramer,  H.   Mathematical  Methods  of  Statistics.   Princeton:  Princeton 

— •        University  Press,  1946. 

ri5l  Crowdef,  M.   Maximum  Likelihood  Estimation  for  Dependent  Observations. 

Journal  of  the  Royal  Statistical  Society,  Series  B  38  (1976),  45-53- 
[16]  Doob,  J.  L.  Probability  and  Statistics.  Transactions  of  the  American 

Mathematical  Society  36  (1934),  759-775- 
[17]  Domowitz,  I.  and  H.  White.  Misspecified  Models  vrith  Dependent 

Observations-   Journal  of  Econometrics  20  (1982),  35-58. 
[is]  Fisher,  F.   The  Identification  Problem  in  Econometrics-   Hew  York: 
McGraw-Hill,  I966. 
---^19]  Fisher,  E.  A.   On  the  Mathematical  Foundations  of  Theoretical  Statistics. 

-Philosophical  Transactions  of  the  Royal  Society  of  London,  Series  A  222 
(1922),  309-368. 
[20]  Fisher,  E.  A.   Theory  of  Statistical  Estimation.   Proceedings  of  the 
^  Cambridge  Philosophical  Society  22  (l925),  700-725- 

[21]  Gallant,  A.  R.   Three  Stage  Least  Squares  Estimation  for  a  System  of 

Simultaneous,  Nonlinear,  Implicit  Equations.   Journal  of  Econometrics 
5  (1977),  71-88. 


u 


[22]  Gallant,  A.  R.  and  H.  White.   A  Unified  Theory  of  Estimation  and  Inference 

for  Nonlinear  I)7namic  Models.   UCSD  Department  of  Economics  Discussion 

Paper  (1985)- 
[23]  Geary,  E.   Determination  of  Linear  Relations  Between  Systematic  Parts  of 

Variables  with  Errors  of  Observation,  The  Variances  of  Which  are 

Unknown.   Econometrica  17  (1949),  30-58. 
[24]  Hall,  P.  and  C.  Heyde .   Martingale  Limit  Theory  and  Its  Applications. 

New  York:  Academic  Press,  1980. 
[25]  Hansen,  L.   Large  Sample  Properties  of  Generalized  Method  of  Moments 

Estimators.  Econometrica  50  (1982),  1029-1054. 
[26]  Heijmans,  R.  and  J.  Magnus.   Consistency  of  Maxim\M  Likelihood  Estimators 

when  Observations  are  Dependent.   University  of  Amsterdam  Faculty  of 

Actuarial  Science  and  Econometrics  Report  No.  AE  11/83,  (1983)« 
[27]  Hoadley,  B.  Asymptotic  Properties  of  Maximum  Likelihood  Estimators 

for  the  Independent  Not  Identically  Distributed  Case.  Annals  of-; 

Mathematical  Statistics  42  (1971 ),  1977-1991- 
[28]  Huber,  P.  The  Behavior  of  Maximum  Likelihood  Estimates  Under  Nonstandard 

Conditions,   in  Proceedings  of  the  Fifth  Berkeley  Symposium  in 

Mathematical  Statistics  and  Probability.  Berkeley:  University  of 

California  Press,  1967. 
[29]  Jennrich,  R.  I.  Asymptotic  Properties  of  Non- Linear  Least  Squares 

Estimators.  Annals  of  Mathematical  Statistics  40  (1969)»  633-643« 
[30]  Jorgenson,  D.  W.  and  J.  Laffont.  Efficient  Estimation  of  Nonlinear 

Simultaneous  Equations  with  Additive  Disturbances.  Annals  of  Economic 

and  Social  Measurement  3  (1974),  615-640. 
[31]  Kullback,  S.  and  R.  A.  Leibler.   On  Information  and  Sufficiency.  Annals 

of  Mathematical  Statistics  22  (l95l),  79-86. 


[32]  Le  Cam,  L.   On  Some  Asymptotic  Properties  of  Maximum  Likelihood  Estimates 

and  Related  Bayes  Estimates.   University  of  California  PublicationB  in 

Statistics  1  (1953),  277-328. 
[33]  Levine,  D.   A  Remark  on  Serial  Correlation  in  Maximum  Likelihood.   Journal 

of  Econometrics  23  (1983),  337-342. 
[34]  Reiers(til,  0.   Confluence  Analysis  by  Means  of  Instrumental  Sets  of 

Variables.   Arkiv  for  Mathematik,  Astronomi  och  Fysik  32  (1945)« 
[35 1  Roberts,  A.  and  Varberg,  L.   Convex  Functions.   New  York:  Academic 

Press,  1973- 
[36]  Rothenberg,  T.   Efficient  Estimation  vith  a  Priori  Informaton.   New  Haven: 

Yale  University  Press,  1973- 
[37]  Silvey,  S.   A  Note  on  Maximum  Likelihood  in  the  Case  of  Dependent  Random 

Variables.   Journal  of  the  Royal  Statistical  Society,  Series  B  23 

(1961),  444-452. 
[53]  ¥ald,  A.   Asymptotic  Properties  of  the  Maximum  Likelihood  Estimate  of  an 

Unknown  Parameter  of  a  Discrete  Stochastic  Process.  Annals  of 
■        Mathematical  Statistics  19  (1948),  40-46. 
[39]  Wald,  A.   Note  on  the  Consistency  of  the  Maximum  likelihood  Estimate. 

Annals  of  Mathematical  Statistics  20  (1949),  595-601. 
[40]  Weiss,  L.   As3miptotic  Properties  of  Maximum  Likelihood  in  Some 

Nonstandard  Cases:  I.   Journal  of  the  American  Statistical  Society  66 

(1971),  345-350. 
[41]  Weiss,  L.   Asymptotic  Properties  of  Maximum  Likelihood  in  Some  Nonstandard 

Cases:  II.   Journal  of  the  American  Statistical  Society  68  (1973),  428- 
430. 
[42]  White,  K.   Using  Least  Squares  to  Approximate  Unknown  Regression  Functions. 
International  Economic  Review  21  (1980),  149-170. 


[43]  White,  H.   Nonlinear  Regression  on  Cross-Section  Data.   Econometrica  48 

(1980),  721-746. 
[44]  White,  H.   Consequences  and  Detection  of  Misspecified  Nonlinear  Regression 

Models.   Journal  of  the  American  Statistical  Association  76  (1981 ), 

419-433. 
[45]  White,  H.   Maximum  Likelihood  Estimation  of  Misspecified  Models. 

Econometrica  50  (1982),  1-26. 
[46]  White,  H.   Asymptotic  Theory  For  Econometricians.   New  York: 

Academic  Press,  1984. 
[47]  life]ite,  H.   Maximum  Likelihood  Estimation  of  Misspecified  Dynamic  Models. 

UCSD  Department  of  Economics  Discussion  Papers,  1984' 
[48]  White,  H.  Estimation,  Inference  and  Specification  Analysis.  Hew  York: 

Cambridge  University  Press,  forthcoming. 
[49]  White,  H.  and  I.  Domowitz.  Nonlinear  Regression  with  Dependent 

Ohservations.  Econometrica  52  (1984),  143-162. 
[50]  Wooldridge,  J.  A  General  Theory  of  Estimation  and  Inference  for  Parametric 

Models.  UCSD  Department  of  Economics  Doctoral  Dissertation,  1985« 


9504    OkQ 


MIT  LIBRARIES 


3    TDflD    D03    Db2    3T2 


5t     f 


V^" 


■^re 


^vi!. 


.-t 


v'l 


■''h%^  ®?!^{^!l.^ 


J-»*. 


V»J'. 


>v 


,*/,^*-s'  -v 


N'^'^J--/ 


*»  *J  T" 


Date  Due 


JUL.  06^^^ 


y^LY-  eoU     B"   OK    l^^^o^^^ 


